Using [semi-supervised methods][semi] described in the documentaton. Label propagation basically involves trying to add labels to the test data based on the labels in the training data.



In [1]:

    
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams['figure.figsize'] = 8, 12
plt.rcParams['axes.grid'] = True
plt.set_cmap('brg')









    





<matplotlib.figure.Figure at 0x7f3f09503588>



In [2]:

    
cd ..









    



/home/gavin/repositories/hail-seizure



In [3]:

    
from python import utils



In [4]:

    
with open("settings/testing_labelprop.json") as fh:
    settings = utils.json.load(fh)



In [5]:

    
with open("segmentMetadata.json") as fh:
    meta = utils.json.load(fh)



In [6]:

    
data = utils.get_data(settings)



In [8]:

    
da = utils.DataAssembler(settings,data,meta)

Then we just need to build training sets for each subject and apply the relevant models. Unfortunately, the cross-validator doesn't handle test segments so we won't be able to run any informative cross-validation.



In [15]:

    
import sklearn.ensemble
import sklearn.preprocessing
import sklearn.semi_supervised



In [19]:

    
scaler = sklearn.preprocessing.StandardScaler()
selector = sklearn.ensemble.ExtraTreesClassifier(n_estimators=1000)
classifier = sklearn.semi_supervised.LabelPropagation()



In [23]:

    
predictions = {}
for subject in settings['SUBJECTS']:
    print("Processing " +subject)
    Xtrain,ytrain = da.build_training(subject)
    Xtest = da.build_test(subject)
    
    X = np.vstack([Xtrain,Xtest])
    y = np.hstack([ytrain,np.array([-1.0]*Xtest.shape[0])])
    
    print("Fitting ExtraTree feature selection.")
    # then we want to fit preprocess the data
    X = scaler.fit_transform(X)
    selector.fit(Xtrain,ytrain)
    
    print("Applying ExtraTree feature selection.")
    X = selector.transform(X)
    
    print("Fitting classifier.")
    # then fit the classifier
    classifier.fit(X,y)
    
    print("Classifying test data.")
    # then classify the test set
    predictions[subject] = classifier.predict_proba(X)[:Xtrain.shape[0],:]
    
    break









    



Processing Dog_1
Fitting ExtraTree feature selection.
Applying ExtraTree feature selection.
Fitting classifier.
Classifying test data.






    



/home/gavin/.local/lib/python3.4/site-packages/sklearn/semi_supervised/label_propagation.py:254: RuntimeWarning: invalid value encountered in true_divide
  self.label_distributions_ /= normalizer



In [24]:

    
predictions









    Out[24]:





{'Dog_1': array([[ nan,  nan],
        [ nan,  nan],
        [ nan,  nan],
        ..., 
        [ nan,  nan],
        [ nan,  nan],
        [ nan,  nan]])}

Unsure why that is happening, could be there is an assumption of the label propagation I'm unaware of that is causing problems.